{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "* [Intro](#Intro)\n",
    "\t* [Probability](#Probability)\n",
    "* [Information Theory](#Information-Theory)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-15T15:58:13.785614",
     "start_time": "2017-09-15T15:58:13.778613"
    }
   },
   "source": [
    "Exploratory notebook related to the theory and introductory concepts behind probability. Includes toy examples implementation and visualization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Probability"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Probability is the science concerned with the understanding and manipulation of uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-29T09:29:11.602277",
     "start_time": "2017-09-29T09:29:07.768058"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt, animation\n",
    "\n",
    "%matplotlib notebook\n",
    "#%matplotlib inline\n",
    "\n",
    "sns.set_context(\"paper\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# interactive imports\n",
    "import plotly\n",
    "import cufflinks as cf\n",
    "cf.go_offline(connected=True)\n",
    "plotly.offline.init_notebook_mode(connected=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:20:06.128579",
     "start_time": "2017-09-16T14:20:06.122579"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class RandomVar:\n",
    "    def __init__(self, probs):\n",
    "        self.values = np.arange(len(probs))\n",
    "        self.probs = probs\n",
    "        \n",
    "    def pick(self, n=1):\n",
    "        return np.random.choice(self.values, p=self.probs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:20:15.237100",
     "start_time": "2017-09-16T14:20:15.229100"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "coin = RandomVar([0.5, 0.5])\n",
    "coin.pick()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:23:37.232654",
     "start_time": "2017-09-16T14:23:37.224653"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "biased_coin = RandomVar([0.1, 0.9])\n",
    "biased_coin.pick()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:21:06.629040",
     "start_time": "2017-09-16T14:21:06.624039"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "die = RandomVar([1/6]*6)\n",
    "die.pick()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Information Theory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We interested in understanding the amount of information related to events. For example given a random variable $x$, the amount of information of a specific value can also be seen as \"degree of surprise\" of seeing $x$ being equal to such value. \n",
    "\n",
    "$$ h(x) = - \\log_2 p(x) $$\n",
    "\n",
    "For a random variable $x$, the corresponding measure calles entropy is defines as:\n",
    "\n",
    "$$ H[x] = - \\sum_x{ p(x) \\log_2 p(x) } $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:25:13.167141",
     "start_time": "2017-09-16T14:25:13.162140"
    },
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# information content for a target probability\n",
    "def info_content(p_x):\n",
    "    return -np.log2(p_x)\n",
    "\n",
    "# entropy of a random variable probability distribution\n",
    "def entropy(p_x):\n",
    "    return -sum(p_x*np.log2(p_x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-16T14:31:05.500293",
     "start_time": "2017-09-16T14:31:05.495293"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "entropy([1/8]*8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Maximum entropy for a discrete random variable is obtained with a uniform distribution. For a continuous random variable we have an equivalent increase in entropy for an increase in the variance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-15T16:40:21.874212",
     "start_time": "2017-09-15T16:40:21.808208"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# log function\n",
    "x = np.linspace(0.00001, 2, 100)\n",
    "plt.plot(x, np.log(x), label='Log')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2017-09-15T17:21:14.463492",
     "start_time": "2017-09-15T17:21:14.391488"
    },
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#log of product equals sum of logs\n",
    "\n",
    "n = 10\n",
    "#a = np.random.random_sample(n)\n",
    "#b = np.random.random_sample(n)\n",
    "plt.plot(a, label='a')\n",
    "plt.plot(b, label='b')\n",
    "plt.plot(np.log(a), label='log(a)')\n",
    "plt.plot(np.log(b), label='log(b)')\n",
    "#plt.plot(np.log(a)+np.log(b), label='log(a)+log(b)')\n",
    "plt.plot(np.log(a*b), label='log(a+b)')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:kaggle]",
   "language": "python",
   "name": "conda-env-kaggle-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "52px",
    "width": "254px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
